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We introduce a Bayesian multiple regression tree model to charac¬ 
terize relationships between physico-chemical properties of nanopar¬ 
ticles and their in-vitro toxicity over multiple doses and times of 
exposure. Unlike conventional models that rely on data summaries, 
our model solves the low sample size issue and avoids arbitrary loss 
of information by combining all measurements from a general expo¬ 
sure experiment across doses, times of exposure, and replicates. The 
proposed technique integrates Bayesian trees for modeling threshold 
effects and interactions, and penalized B-splines for dose- and time- 
response surface smoothing. The resulting posterior distribution is 
sampled by Markov Chain Monte Carlo. This method allows for in¬ 
ference on a number of quantities of potential interest to substantive 
nanotoxicology, such as the importance of physico-chemical proper¬ 
ties and their marginal effect on toxicity. We illustrate the application 
of our method to the analysis of a library of 24 nano metal oxides. 

1. Introduction. The increasing nse of engineered nanomaterials (ENM) 
in hnndreds of consnmer prodncts has recently raised concern abont their 
potential effect on the environment and hnman health in particular. In nan¬ 
otoxicology, in vitro dose-escalation assays describe how cell lines or sim¬ 
ple organisms are affected by increased exposure to nanoparticles. These 
assays help determine hazardous materials and exposure levels. Standard 
dose-escalation studies are sometimes completed by more general exposure 
escalation protocols, where a biological outcome is measured against both 
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increasing concentrations and dnrations of exposure. Cost and timing issues 
usually only allow for a small number of nanoparticles to be comprehensively 
screened in any study. Therefore, both one- and two-dimensional escala¬ 
tion experiments are often characterized by small sample sizes. Furthermore, 
data exhibits natural clusters related to varying levels of nanoparticles bio¬ 
activity. The two case studies presented in Section 6 provide an overview of 
the structure of typical data sets obtained with both experimental protocols. 

Beyond dose-response analysis, nanomaterial libraries are also designed 
to investigate how a range of physical and chemical properties (size, shape, 
composition, surface characteristics) may influence ENM’s interactions with 
biological systems. The nano-informatics literature reports several Quan¬ 
titative Structure-Activity Relationship (QSAR) models. This exercise is 
conceived as a framework for predictive toxicology, under the assumption 
that nanoparticles with similar properties are likely to have similar effects. 
Most of existing QSAR models summarize or integrate experimental data 
across times, doses and replicates as a preprocessing step, before applying 
traditional data mining or statistical algorithms for regression. For example, 
Liu et al. (2011) use a modified Student’s t-statistic to discretize outputs 
in two classes (toxic or nontoxic) and a logistic regression model to relate 
toxicity to physico-chemical variables. Zhang et al. (2012) use the area un¬ 
der the dose-response curve as a global summary of toxicity and they model 
dependence on predictors via a regression tree. Both approaches, while rea¬ 
sonably sensible, ignore the uncertainty associated with data summaries and 
can lead to unwarranted conclusions as well as unnecessary loss of informa¬ 
tion. Patel et al. (2014) summarize toxicity prohles using a new dehnition of 
toxicity, called the probability of toxicity, which is defined as a linear function 
of nanoparticle physical and chemical properties. While this last approach 
solves the issue of uncertainty propagation, it still makes it impossible to pre¬ 
dict full dose-response curves from nanoparticle characteristics. Moreover, 
the use of regression trees is inherently appealing, as they are able to model 
nonlinear effects and interactions without compromising interpretation. We 
aim to extend regression tree models to account for structured multivari¬ 
ate outcomes, defined as toxicity profiles of nanoparticles, measured over a 
general exposure escalation domain. 

Multivariate extensions of the regression tree methodology have been pro¬ 
posed by Segal (1992). In this paper, the original tree-building algorithm 
of Breiman et al. (1984) is modified to handle multivariate responses for 
commonly used covariance matrices, such as independence or autoregressive 
structures. De’ath (2002) proposes a similar method for an independent co- 
variance structure. Yu and Lambert (1999) develop regression tree models for 
functional data, by representing each individual response as a linear combi¬ 
nation of spline basis functions and using the estimated splines coefficients 
in multivariate regression trees. An alternative for longitudinal responses 
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consists of combining a tree model and a linear model: Sela and Simonoff 
(2012) replace the fixed effects of the traditional linear mixed effects model 
by a regression tree. The linear random effects are unchanged. Yu et al. 
(2010) fit a semi-parametric model, containing a linear part and a tree part, 
for multivariate outcomes in genetics. The linear part is used to model main 
effects of some genetic or environmental exposures. The nonparametric tree 
part approximates the joint effect of these exposures. Finally, Galimberti and 
Montanari (2002) develop regression tree models for longitudinal data with 
time-dependent covariates. In this setting, measures for the same individual 
can belong to different terminal nodes. 

Other extensions of standard regression trees include Bayesian approaches, 
where tree parameters become random variables. Chipman, George and Mc¬ 
Culloch (1998) introduce a Bayesian regression tree model for univariate 
responses. The method is based on a prior distribution and a Metropolis- 
Bastings algorithm which generates candidate trees and identifies the most 
promising ones. This methodology has since been extended to so-called treed 
models, where a parametric model is fitted in each terminal node [Chipman, 
George and McCulloch (2002)], to a sum-of-trees model [Chipman, George 
and McCulloch (2010a)], and to incorporate spatial random effects for merg¬ 
ing data sets [Zhang, Shih and Muller (2007)]. Gramacy and Lee (2008) 
model nonstationary spatial data by combining Bayesian regression trees 
and Gaussian processes in the leaves. This approach is extended to the mul¬ 
tivariate Gaussian process with separable covariance structure in Konomi 
et al. (2014). 

Building on previous contributions, we propose a new method to ana¬ 
lyze the relationship between nanoparticles physico-chemical properties and 
their toxicity in exposure escalation experiments. We extend the Bayesian 
methodology of Chipman, George and McCulloch (1998) to allow for dose- 
and time-response kinetics in terminal nodes. Our work is closely related to 
the methodology introduced in Konomi et al. (2014). However, our model is 
specifically adapted to exposure escalation experiments, as observations for 
the same nanoparticle at different doses and times cannot fall in separate 
leaves of the tree. Therefore, the binary splits of the tree only capture struc¬ 
ture activity relationships instead of the general increase of toxicity with 
exposure. 

A global covariance structure accounts for correlation between measure¬ 
ments at different doses and times for the same nanoparticle. Our approach is 
able to model nonlinear effects and potential interactions of physico-chemical 
properties without making parametric assumptions about toxicity profiles. 
It also addresses the issues associated with conventional QSAR models by 
combining evidence across measurements for all doses and times in a gen¬ 
eral experimental design. The proposed model is particularly versatile, as 


4 


C. LOW-KAM ET AL. 


it provides scores of importance for physico-chemical properties and visual 
assessment of the marginal effect of these properties on toxicity. 

The rest of this paper is organized as follows: Section 2 describes the 
regression model for dose-response data and Section 3 describes the corre¬ 
sponding prior model. The resulting posterior distribution and the associ¬ 
ated MCMC algorithm are presented in Section 4. The model is extended 
to the case of dose- and time-response surfaces in Section 5. The method is 
applied to a library of 24 metal oxides in Section 6 and Section 7 concludes 
this paper with a discussion. 

2. Regression tree formulation. 

2.1. Sampling model. We first consider the case of a typical dose escala¬ 
tion experiment, where a biological outcome is measured over a protocol of 
increased nanoparticle concentration. This case will be expanded in Section 5 
to include more general exposure escalation designs. 

Let yik{d) denote a real-valued response associated with exposure to 
nanoparticle i and replicate k at dose d, for i e {1,..., /}, k G {1,... ,K} 
and dG [0,11]. We assume that y has been appropriately normalized and 
purihed of experimental artifacts. For the two case studies of Section 6, 
normalization was performed for each tray by subtracting a baseline mean 
response, measured in control wells where cells were not exposed to any 
nanoparticle. After normalization, we indeed assume independence between 
wells exposed to different nanomaterials on the same tray. Current experi¬ 
mental protocols only allow for the observation of the outcome y as it varies 
in association with a discrete prescription of dose-escalation. However, for 
notational convenience and without loss of generality, we maintain that y 
shall be observed for any dose level d ranging from no exposure (d = 0) to a 
maximal nanoparticle concentration level {d = D). Let also xi = {xn ,..., Xip) 
be a p-dimensional vector of continuous physico-chemical characteristics or 
predictors associated to nanoparticle i. We assume 

(2.1) yik{d) = f{xi,d) -h Eikid), 

where / is a random mean function, depending on the dose level d and 
nanoparticle characteristics Xj, and Sik ~ A^(0,(T^). More precisely, / is de¬ 
fined by a regression tree T on the predictor space and a functional model 
for dose-response curves in the terminal nodes of T. Full details about the 
proposed mean structure are described in the following section. 

Given /, we assume that outcomes are independent across nanoparticles 
and, for any nanoparticle i, CoY{sik{d), Sik' {d')) = ^ with ipo G [0,1]. 

In this setting, two outcomes associated with the same nanoparticle at sim¬ 
ilar doses are assumed to be more correlated than measurements taken at 
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distant doses, for any replicate. The major advantage of this assumption 
is related to a reduced representation of a high-dimensional covariance ma¬ 
trix, which is now fully characterized in terms of a 1-dimensional variance 
parameter cr^ and a 1-dimensional correlation 

2.2. Mean structure. The binary tree T recursively splits the predictor 
space into two subspaces, according to criteria of the form x.j < a vs x.j > a, 
for a G M and j G {1,... ,p}. Each split defines two new nodes of the tree, 
corresponding to two newly created subspaces of predictors. Let n be the 
set of terminal nodes of tree T. 

We model the dose-response curves in each terminal node as a linear 
combination of spline basis functions. Unlike parametric models such as log- 
logistic, spline functions do not assume a particular shape for the curve. 
This makes our model fully applicable to sub-lethal biological assays, which 
are not expected to follow a sigmoidal dose-response dynamic. However, if 
needed, the spline model can easily allow for possible shape constraints, 
such as monotonicity, by using a modified basis [Ramsay (1998)]. This flex¬ 
ibility makes the use of spline basis representations potentially preferable 
to Gaussian process priors or similar smoothers. A formal comparison is, 
however, outside the scope of this manuscript. Our chosen functional repre¬ 
sentation is easily extended to two-dimensional response surfaces (Section 5). 
Let Hi (•),..., HmD+(5(') denote tud + ^ uniform B-spline basis functions of 
order 5 on [0,11], with rriD fixed knots. Following Eilers and Marx (1996), 
we avoid choosing the location of spline interior knots by deliberately over¬ 
fitting curves with a number of knots coinciding with the dose-escalation 
grid. Adaptive smoothness is determined by using a penalty on adjacent 
coefficients, via a smoothing prior that will be presented in Section 3. 

If Xj is in the subset corresponding to the rth terminal node of T, /(x*, d) = 
We will denote with (3^ = (/3ri,... ,l3rmB+&)' vector of 
splines coefficients defining the expected dose-response trajectory in the rth 
terminal node. Furthermore, we let (3 define the random set of spline coef¬ 
ficients, including from all terminal nodes (r = l,...,n). The Bayesian 
model is completed by prior distributions on T, /3, cr^ and (pD- 

3. Prior model. We first introduce the general dependence structure of 
the prior, before describing each parameter’s prior distribution. We follow 
Chipman, George and McGulloch (1998), and assume that the tree is inde¬ 
pendent of variance components cr^ and p d '■ 

(3.1) p{T,f3,a‘^,pD) =p{T,P)p{a^)p{pD) = p{f3\T)p{T)p{a^)p{pD)- 

Moreover, conditionally on T, terminal node parameters are assumed inde¬ 
pendent: p{(3\T) = Y\YiP{Pr\T~)- Therefore, the prior is fully determined by 
a tree prior p{T), terminal node parameters priors p(/3^|T), and variance 
parameters priors p{cr'^) and p{ip£)). 


6 


C. LOW-KAM ET AL. 


3.1. Tree prior. The tree prior p{T) is implicitly described by the stochas¬ 
tic tree-generating process of Chipman, George and McCulloch (1998), where 
each new tree is generated according to the following: (i) the probability for 
a node at depth q to be nonterminal, given by a(l -|- {q = 1,2,...), (ii) 
the probability for a node to split at a predictor x.j, {j = 1,... ,p), given by 
the discrete uniform distribution on the set of available predictors, and (iii) 
given the predictor x.j, the probability for a node to split at a value a, given 
by the discrete uniform distribution on the set of available splitting values. 
Probability (i) is a decreasing function of q, making deeper nodes less likely 
to split and favoring “bushy” trees. Chipman, George and McCulloch (1998) 
give guidelines to choose parameters a and by plotting the marginal prior 
distribution of the number of terminal nodes. In (ii) and (iii), predictors and 
splits are available if they lead to nonempty child nodes. 


3.2. Terminal node splines coefficients prior. We follow Lang and Brezger 
(2004) and consider a conditionally conjugate P-spline prior: f3^\T,T‘^ oc 
exp(—^/3(,iL^/3^), where is an additional smoothing variance param¬ 
eter and 


(3.2) 


K0 


( ^ \ 

-1 2 -1 


-1 2 -1 

V -11/ 


is a penalty matrix of size [mo+ 6) x [m dTS), corresponding to a first order 
random walk. Note that this prior is improper, as the matrix Kj^ is not of full 
rank. In order to work with a proper prior in a model comparison setting, 
we replace the first and last element of the diagonal with 1 + rj, where rj is 
a small constant. The model is completed by assigning a conjugate Inverse- 
Gamma hyperprior to the smoothing parameter t^|T~IG(ot-,6t)- 


3.3. Variance components priors. We assume lG{aa,ba). For ipo, 
we choose the conjugate prior described by Rowe (2003) for autoregressive 
covariance matrices, with truncated support on [0,1]. Let 0 = di < • • • < 
drijj = D he the dose-escalation design sequence: 

/n Q\ ( \ 2 \-(no-l)/2 ( Aoi - ¥?nAo2 + </?|»Ao3 

(3.3) p{^d)o^{1-Td) expj^- 2{\ - ) - 

where I is the indicator function, A = {^vv')i<v,v><no is a hyperparameter 
matrix, and (Aqi, A 02 , Aqs) are defined through its diagonal, subdiagonal, and 
superdiagonal elements as follows: Aqi = A^.^, A 02 = + 

A^+i.,,), Ao 3 = practice, we choose A = Idn^ > the identity ma¬ 

trix of size no x no, to put more weight on low values ol ipo and assume 
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weak prior correlations between responses at different doses. This last dis¬ 
tribution completes the prior model. We now turn to posterior inference on 
parameters, given the observations. 

4. Posterior inference through MCMC simulation. We are interested 
in the posterior distribution p(T,/3,(T^,(/?z),r^|y). The rest of this section 
describes a Markov chain Monte Carlo algorithm for sampling from this 
distribution, as the number of potential trees prevents direct calculations. 
Our Gibbs sampler is adapted from the algorithms of Chipman, George and 
McCulloch (1998) and Gramacy and Lee (2008), with changes due to the 
specific structure of our model. 

At each iteration, the algorithm performs a joint update of (T,/3), condi¬ 
tionally on the rest of the parameters, followed by standard Gibbs component¬ 
wise updates of each variance parameter. The joint tree and terminal nodes’ 
spline coefficients update is decomposed into 

(4.1) T|y,T^,y?z),r^; followed by 

(4.2) /3^|r,y,o-^,<y9D,T^; for rG {l,...,n}. 

The draw of T in (4.1) is performed by the Metropolis-Hastings algorithm 
of Chipman, George and McCulloch (1998), which simulates a Markov chain 
of trees that converges to the posterior distribution p(T|y, cr^, r^). The 
proposal density suggests a new tree based on four moves: grow a terminal 
node, prune a pair of terminal nodes, change the split rule of an internal 
node, and swap the splits of an internal node and one of its children’s. 

The target distribution can be decomposed as follows: 

P(7'|y,cr^ r^) 

(4.3) 

ocp(r) J p{y\P,T, a‘^,pD,T‘^)p{P\T, a'^,ipD,T^)dl3. 

The expression for the integral above is given in Low-Kam et al. (2015), in 
a closed form by conjugacy of the prior on j3 = {/3^,..., /3„}. Therefore, the 
draw of T in (4.1) does not require a reversible-jump procedure for spaces 
of varying dimensions, even if nodes are added or deleted. The proposal 
density of the Metropolis-Hastings algorithm can be conveniently coupled 
withp(T) to simplify calculations [Chipman, George and McGulloch (2002)]. 
Full conditional distributions for f3i,..., /3„ in (4.2) and variance parameters 
ifD and are available in Low-Kam et al. (2015). 

Given posterior samples, predictive statistics are easily obtained via Monte 
Carlo simulation of p(y*|y), for i = 1,...,I. More precisely, let x* = x*. At 
each iteration ^ = 1,...,AI, the MCMC algorithm performs a draw from 

p(T,/3,o'^,(/3i:),T^|y), followed by a draw of from the multivariate nor¬ 
mal distribution p(yj^^*|T,/3, In our case studies (Section 6), for 
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example, we compare posterior summaries from the predictive distribution 
p{y*f.{d)\y) to observed dose-response data yik{d). We perform two series of 
posterior predictive checks: in the first one, the generated predictive sam¬ 
ples are conditioned on the full set of dose-response curves, via the tree. 
The objective is to assess model adequacy and calibration. The second se¬ 
ries studies model prediction accuracy using a leave-a-curve-out validation 
scheme, where each data curve is compared to the corresponding predictive 
sample obtained by htting the tree on the remaining curves. 

Posterior inference based on Monte Carlo samples is also used to derive 
inferential summaries about nontrivial functionals of the parameter/model 
space. The marginal effect of a physico-chemical property x,j on the response 
can be represented by the partial dependence function of Friedman (2001): 
let xij ,..., xsj be a grid of new values for x,j. Then the partial dependence 
function is f{xsj,d, Xsj ) . . . , Xip ), d, t')')/1, 

where Xiji is the ith observation of x.j/ in the data. For all doses, plotting 
the average of this function over Monte Carlo draws provides a visualization 
of the marginal effect of x,j. This partial dependence function can also be 
extended to account for the joint marginal effect of two variables. 

Similarly, posterior realizations y\x can be used to report importance 
scores for each variable. For all j € {1,..., P}, Sj = and Tj = 

first-order and total sensitivity indices for variable x,j, 
and represent the main and total influence, respectively, of this variable on 
the response [Gramacy, Taddy and Wild (2013)]. Unlike other metrics such 
as the variance reduction attributed to splits on the variable, sensitivity 
indices are robust to leaf model specihcations and are therefore adapted for 
a dose-response leaf model. Both indices are defined given an uncertainty 
distribution on the inputs, usually the uniform distribution on the covariates 
space. We follow Gramacy, Taddy and Wild (2013) and use a Monte Carlo 
scheme to approximate Sj and Tj, that accounts for unknown responses by 
using predicted values for a Latin hypercube sampling design. 

5. Extending the model to two-dimensional toxicity profiles. More gen¬ 
eral exposure escalation protocols involve the observation of a biological 
outcome y in association with a prescription of dose escalation d G [0,11], 
observed for a series of exposure times t € [to,T]. Letting k, {k = 1,..., K) be 
a replication index, we define yik{d, t) as the outcome of interest, evaluated at 
dose d, time t and extend the model in (2.1): yik{d,t) = f{xi,d,t) + eik{d,t), 
where / is a random mean response surface and £ik{d,t) Ni0,al). To 
account for dependence between doses and durations of exposure, for each 
nanoparticle z, we assume CoY{eik{d,t),£ik'{d'fi')) where 

ifD G [0,1] and ipT & [0,1] are autocorrelation parameters. 
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The response surface / in the terminal nodes of T is modeled by a ten¬ 
sor product of two one-dimensional P-splines [Lang and Brezger (2004)]. 
Let Bi(-),..., dehned as in Section 2.2 and Bi(-),..., de¬ 

note rriT + C B-spline basis functions of order on [to,T], with rriT fixed 
knots. Then, if Xj is in the subset corresponding to the rth terminal node 
of Tj, f{xi,d,t) = ^rirnBi{d)Brn{t), where = {Prll,---, 

Pr{mo+S){mT+oy ^ vector of Spline coefficients associated to the rth ter¬ 
minal node. 

The prior model has the same global dependence structure as in Section 3, 
but now includes an additional independent term (pT for time-covariance. 
Let to = < • • • < triT = T he the sequence of exposure times when toxicity 

was measured. We adapt prior (3.3) to preserve conjugacy and introduce a 
similar distribution for ipT- 

/c-.x / X 2 x-(nT(no-l ))/2 f Aqi - Ao 2 + ¥^|)Ao 3 \, 

(5.1)p(<^d)oc( 1-(/?^) ^ ^ exp|^- 2(1 - ) - J^oelo,!] 

/. „X / X 2 \-(nD(nT-l ))/2 f lOl - V>Tl02 + ^t ^03 

{5.2) p{(pT} (X {1 - p)t) expj^- _ ^2 ) - 

where 701, 702 and 703 are obtained by summing elements of the diagonal, 
subdiagonal, and superdiagonal of matrix parameter prior T, constructed 
following the guidelines introduced in Section 3.3. For the terminal nodes’ 
spline coefficient priors, we use a spatial extension of Besag and Kooperberg 
(1995), a first order random walk prior based on the four nearest neigh¬ 
bours of splines coefficients, with appropriate changes for corners and edges: 
/3^|T, oc exp(—where Kp is a penalty band matrix of size 
{m£) + 5){mT + C) X (™T> + d){mT + Q), which extends matrix (3.2) to the 
two-dimensional case. For posterior inference, we add a step to generate (pT 
in the Gibbs sampler of Section 4. 

6. Applications. A simulation study to assess model performance is de¬ 
scribed in Low-Kam et al. (2015). In the rest of this section we illustrate our 
approach with experimental results from a case study reported by Zhang 
et al. (2012), measuring toxicity of 24 metal oxides on human bronchial 
epithelial (BEAS-2B) cells. 

6.1. Case studies background. After 24 h, Lactate Dehydrogenase (LDH) 
release was used to measure the death rate of cells exposed to eleven doses 
of metal oxides (from 0 to 200 pg/ml), evenly spaced on the logarithmic 
scale. Cell death is commonly used to screen for ENM cytotoxicity without 
reference to a specific mechanism. Figure 1 shows the LDH dose-responses 
curves for the 24 metal oxide nanoparticles. In a second assay, Propidium 
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Iodide (PI) fluorescence was used to indicate the percentage of cells expe¬ 
riencing oxidative stress through cellular surface membrane permeability, 
across the same ten doses and after six times of exposure (from 1 to 6 h, 
at every hour). Figure 2 shows a heatmap representation for the PI assay, 
for all metal oxides, doses, times, and replicates, where responses are color- 
coded from light (low) to dark (high). In both assays, seven metal oxides 
(C 03 O 4 , CoO, Cr 203 , CuO, Mn 203 , Ni 203 and ZnO) display a notable rise 
for the higher doses, suggesting toxicity. 

All metal oxides are characterized by six physico-chemical properties of 
potential interest to explain toxicity profiles: nanoparticle size in media, a 
measure of the crystalline structure (b(A)), lattice energy (AFfiattice); which 
measures the strength of the bonds in the nanoparticles, the enthalpy of 
formation {AH]if^n+), which is a combined measure of the energy required 
to convert a solid to a gas and the energy required to remove n electrons from 
that gas, metal dissolution rate, and conduction band energy (the energy to 
free electrons from binding with atoms). 
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Fig. 2. Heatmap for PI fluorescence assay, color-coded from light (low) to dark (high). 
Each row corresponds to a nanoparticle at one dose across 6 times (1 to 6 h) and 4 
replicates. For each nanoparticle, there are 11 rows, one for each dose (0 to 200 yg/ml), 
arranged from bottom to top. 


In our analysis, we use cubic splines, that is, <5 = C = 4, and place in¬ 
terior knots at each intermediate dose from 0.39 to 100 pg/ml. Therefore, 
np) = niD + “2^ and np = mp + 2. For the tree prior, we adopt the default 
choice of Chipman, George and McCulloch (1998), (a,z/) = (0.95,2), which 
puts more weight on trees of size 2 or 3. We place relatively diffuse priors 
Gamma(l, 1) on precision parameters 1/r^ and l/u^. We choose A = Idnjy 
and r = Idnj,, assuming no prior correlations between measurements at dif¬ 
ferent doses and times. Finally, moves “Grow,” “Prune,” “Change” and 
“Swap” of the Metropolis-Hastings tree-generating algorithm have probabil¬ 
ities 0.1, 0.1, 0.6 and 0.2, respectively. We used a total of 160,000 iterations. 
After discarding 80,000 iterations for burn-in, the remaining samples for es¬ 
timation were thinned to save computer storage. The rest of this section 
shows the results obtained on LDH and PI assays. 

6.2. LDH dose-escalation assay. Figure 3 (top) shows both sensitivity 
indices described in Section 4 for the six physico-chemical properties. Fig¬ 
ure 3 (bottom) shows the combined marginal effect of conduction band en¬ 
ergy and dissolution on LDH, obtained with the partial dependence function 
of Friedman (2001), and color-coded from light (low) to dark (high), for dose 
200 pg/ml. The tree isolates a first region of high toxicity, corresponding to 
ENM with high dissolution rates (ZnO and CuO). This region corresponds 
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to the first mechanism of toxicity identified by Zhang et al. (2012): highly 
soluble metal oxides, such as ZnO and CuO, are more likely to release metal 
ions and disturb the cellular state. A second region of toxicity on Figure 3 
(left) includes metal oxides C 03 O 4 , CoO, Cr 203 , Mn 203 and Ni 203 , with Ec 
values ranging from —4.33 eV for Mn 203 to —4.59 eV for Ni 203 . This region 
matches the second mechanism for toxicity described by Zhang et al. (2012): 
the overlap of the conduction band energy of the metal oxides with the bio¬ 
logical redox potential of cells, ranging from —4.12 to —4.84 eV. When these 
two energy levels are alike, transfer of electrons from metal oxides to cells is 
facilitated, disturbing the intracellular state. Note that Figure 3 (bottom) 
also shows an additional split that isolates Mn 203 , whose toxicity for the 
LDH assay is more comparable to ZnO and CuO (see Figure 1). Similar 
figures for other doses are included in Low-Kam et al. (2015). The LDH 
assay illustrates how threshold effects and interactions of physico-chemical 
properties are accurately captured by a tree structure. 

We perform posterior predictive checks for model fitting. Figure 4 shows 
the expected posterior predictive dose-response curves for two nontoxic metal 
oxides (Ce 02 and Fe 304 ) and two toxic ones (Cr 203 and ZnO), with the as¬ 
sociated 90% intervals. All four intervals provide good coverage for the orig¬ 
inal data. The other 20 curves exhibit similar behavior and can be found in 
Low-Kam et al. (2015). We also study the prediction accuracy of the model 
using a leave-a-curve-out validation framework. Results for Ce 02 , Fe 304 , 
Cr 203 and ZnO are presented in Low-Kam et al. (2015). While leave-one- 
out predictions recover general trends, in some cases we observe suboptimal 
coverage, especially in sparse areas of the physico-chemical spectrum. For ex¬ 
ample, nanoparticles ZnO and CuO alone determine tree splits on the metal 
dissolution parameter and, once removed, cannot be accurately predicted by 
the model. 

Finally, the proposed methodology is compared for validation to the 
Bayesian Additive Regression Trees (BART) method of Chipman, George 
and McCulloch (2010a), a sum-of-tree extension of Chipman, George and 
McCulloch (1998), with the R package “BayesTree” [Chipman, George and 
McCulloch (2010b)]. As BART model one-dimensional responses, we use 
the area under the LDH curves (AUC) as the dependent variable. In Chip- 
man, George and McCulloch (2010a), the proportion of all splitting rules at¬ 
tributed to a variable at each draw on all trees, averaged over all iterations, 
is proposed as a measure of variable importance, when the number of trees is 
small. Results are presented in Low-Kam et al. (2015). Variable importance 
scores and marginal effects from BART are similar to those obtained with 
our method and confirm that the AUC is an accurate summary for toxicity 
for the LDH assay. The first advantage of using a dose-response leaf model 
instead of the AUC is that we avoid preliminary assessment of the data for 
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Fig. 3. LDH assay. (Top) First order (S) and total (T) sensitivity indices for the six 
physio-chemical properties in the LDH assay. (Bottom) 2-dimensional partial dependence 
function for marginal effect of metal dissolution (log scale) and conduction band energy in 
the LDH assay at 200 gg/ml. The toxicity response is color-coded from light (low) to dark 
(high). The figure also shows the projections of the 2f metaloxides in this subspace. 
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Fig. 4. Posterior predictive curves for Ce02, Fe^Ot, Cr 203 and ZnO. The points are 
the observed replicates and the dashed line is the average observed response. The expected 
posterior predictive curve and 90% interval are in solid lines. 
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choosing a summary over another: toxicologists usually report several tox¬ 
icity parameters (EC50, slope), as they may convey different information. 
The second advantage is better understood from a predictive perspective, 
as our model allows for full dose-response dynamics instead of the AUC. 
A comparison with the Treed Gaussian Process, using the R package tgp 
[Gramacy and Taddy (2010)], is also included in Low-Kam et al. (2015). 
After tuning tgp to forbid splitting on dose (basemax, splitmin), we can 
indeed reproduce the essential structure of our model using this well-tested 
R library. Our findings proved to be robust to differing details in the prior 
specification, as the model fit with tgp also captures the marginal effects of 
the predictors metal dissolution and conduction band energy on toxicity. 

6.3. PI general exposure assay. Figure 5 (top) shows the variable sen¬ 
sitivity indices of the six physico-chemical properties. Figure 5 (bottom) 
illustrates the marginal effect of both conduction band energy and dissolu¬ 
tion on membrane damage, calculated with the partial dependence function, 
and color-coded from light to dark, for dose 200 pg/ml and time 6 h. The tree 
model for PI assay also identifies the two areas of toxicity indicated in Zhang 
et al. ( 2012 ), corresponding to highly soluble metal oxides and nanoparticles 
whose conduction band energy overlaps with cellular redox potential range. 
Additional figures for marginal effect of conduction band energy and metal 
dissolution, for all doses and times, are included in Low-Kam et al. (2015). 
The similarity of variable importance scores and marginal effect of conduc¬ 
tion band energy and dissolution obtained for LDH and PI assays indicates a 
strong correlation between these assays for nanoparticle toxicity assessment, 
as noted by Zhang et al. (2012). Figure 6 illustrates the posterior predictive 
90% surface intervals for two nontoxic metal oxides (La 203 and Ti 02 ) and 
two toxic ones (C 03 O 4 and CuO), showing good posterior coverage over all 
doses and times of exposure. Similar surfaces for the other 20 metal oxides 
are plotted in Low-Kam et al. (2015). Leave-a-surface-out predictions for 
La 203 , Ti 02 , C 03 O 4 , and CuO are presented in the appendix and show the 
limitations of the model for prediction when extrapolating to sparse areas 
of the covariate space, similar to what we observed in the LDH assay. 

7 . Discussion. We propose a Bayesian regression tree model to define 
relationships between physico-chemical properties of engineered nanoma¬ 
terials and their functional toxicity profiles in dose-escalation assays. As 
demonstrated by the case studies, the tree structure is adapted to account 
for flexible models of structure-activity relationships, such as threshold ef¬ 
fects and interactions. The proposed model integrates information across all 
doses and replicates, and therefore is adapted to small sample sizes usually 
found in nanotoxicology data sets. Monte Carlo integration over the model 
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Fig. 5. PI uptake. (Top) First order (S) and total (T) sensitivity indices for the six 
physio-chemical properties in the PI assay. (Bottom) 2-dimensional partial dependence 
function for marginal effect of metal dissolution (log scale) and conduction band energy in 
the PI assay at 200 gg/ml and 6 h. The toxicity response is color-coded from light (low) to 
dark (high). The figure also shows the projections of the 2) metaloxides in this subspace. 
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Fig. 6. Posterior predictive surfaces for La 2 03 , Ti 02 , C 03 O 4 , and CuO. The solid line 
is the expected posterior predictive surface with the associated 90% interval. The points 
are the observed data replicates. 


space provides straightforward inference on nontrivial functionals of param¬ 
eters of interest and prediction of full dose-response curves from nanoparticle 
characteristics. The smoothing splines representation allows for easy exten- 


Time Time 
































18 


C. LOW-KAM ET AL. 


sion of the model to two-dimensional toxicity profiles of general exposure 
escalation assays as well as for modeling sub-lethal outcomes. 

The convergence of Bayesian tree models should be carefully assessed 
for all applications of the proposed methodology. The four moves of the 
Metropolis-Hastings algorithm of Chipman, George and McCulloch (1998) 
work well in our simulations and case studies, however, other applications 
might require additional moves to move faster through the tree space and 
improve convergence [see, e.g., Gramacy and Lee (2008), Wu, Tjelmeland 
and West (2007)]. As illustrated in Section 6, another potential pitfall of 
the model is its predictive performance for sparsely explored nanoparticle 
characteristics. This issue is not specific to our model and possible improve¬ 
ments would be obtained by combining multiple studies in a meta-analysis 
framework, with the appropriate adjustments for data heterogeneity or for¬ 
malizing explicit prior knowledge about hazardous nanoparticle properties. 

As seen in the case study for cell death and cellular membrane perme¬ 
ability, different toxicity mechanisms can be closely related. Therefore, an 
important opportunity for model extensions would be to combine different 
biological assays in a single analysis, the final goal being that of understand¬ 
ing if nanoparticles physical and chemical properties have a differential effect 
on different cellular injury pathways. This would require more sophisticated 
modeling strategies that will be more likely to be useful if technological ad¬ 
vances will allow for feasible screening of much larger nanomaterial libraries. 
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reflect the views of the National Science Foundation or the Environmental 
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SUPPLEMENTARY MATERIAL 

Additional results for online publication (DOI: 10.1214/14-AOAS797SUPPA 
.pdf). This appendix provides full conditional distributions and additional 
experimental results. 

Code (DOI: 10.1214/14-AOAS797SUPPB; .zip). This folder contains a 
G-|—|- implementation of the algorithm. 
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